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Abstract 

Novel  algorithms  for  object  recognition  are  described  that  directly  recover  the  transformations  relating 
the  image  to  its  model.  Unlike  methods  Rtting  the  conventional  framework,  these  new  methods  do  not 
require  exhaustive  search  for  each  feature  correspondence  in  order  to  solve  for  the  transformation.  Yet 
they  allow  simultaneous  object  identification  and  recovery  of  the  transformation.  Given  hypothesized 
corresponding  regions  in  the  model  and  data  (2D  views)  —  which  are  from  planar  surfaces  of  the  3D 
objects  —  these  methods  allow  direct  compututation  of  the  parameters  of  the  transformation  by  which 
the  data  may  be  generated  from  the  model.  We  propose  two  algorithms:  one  based  on  invariants  derived 
from  no  higher  than  second  and  third  order  moments  of  the  image,  the  other  via  a  combination  of  the 
afhne  properties  of  geometrical  and  differential  attributes  of  the  image.  Empirical  results  on  natural 
images  demonstrate  the  effectiveness  of  the  proposed  algorithms.  A  sensitivity  analysis  of  the  algorithm  is 
presented.  We  demonstrate  in  particular  that  the  differential  method  is  quite  stable  against  perturbations 
—  although  not  without  some  error  —  when  compared  with  conventional  methods.  We  also  demonstrate 
mathematically  that  even  a  single  point  correspondence  sufhces,  theoretically  at  least,  to  recover  afhne 
parameters  via  the  differential  method. 
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1  Introduction 

Object  recognition  is  one  of  the  central  problems  in  com¬ 
puter  vision.  The  task  of  model-based  object  recognition 
(e.g.,  [4])  is  to  Rnd  the  object  model  in  the  stored  library 
that  best  fits  the  information  from  the  given  image.  The 
most  common  methods  of  model  based  object  recogni¬ 
tion  fall  into  two  categories  from  the  point  of  view  of  how 
the  objects  are  represented  and  how  they  are  matched: 

The  first  represents  objects  by  a  set  of  local  geometri¬ 
cal  features  —  such  as  vertices  that  can  be  fairly  stably 
obtained  over  different  views  —  and  matches  the  model 
features  against  the  image  features,  typically  in  a  exhaus¬ 
tive  manner.  In  general,  this  type  of  method  simultane¬ 
ously  identifies  the  object  and  recovers  the  transforma¬ 
tion.  Equivalently,  it  recovers  the  pose  of  the  object  that 
would  yield  an  image  from  the  object  model  in  which  the 
projected  features  best  matches  those  found  in  the  given 
image  (e.g.,  [4,  7,  24,  11]).  One  such  method  is  based  on 
the  ‘hypothesize  and  test’  framework.  It  first  hypothe¬ 
sizes  the  minimum  number  of  correspondences  between 
model  and  image  features  that  are  necessary  to  compute 
the  transformation  e.g., [7,  24].  Then,  for  each  hypothe¬ 
sized  set  of  corresponding  features,  the  transformation  is 
computed  and  then  used  to  reproject  the  model  features 
onto  the  image  features.  The  hypothesized  match  is  then 
evaluated  based  on  the  number  of  projected  features  that 
are  brought  into  close  proximity  to  corresponding  image 
features,  and  the  pair  of  the  transformation  and  model 
with  the  best  match  is  selected. 

While  this  approach  has  achieved  remarkable  suc¬ 
cess  in  recognizing  objects,  particularly  in  dealing  with 
the  problem  of  occlusions  of  object  surfaces,  it  still  has 
practical  computational  problems,  due  to  its  exhaustive 
search  framework.  For  example,  even  with  a  popular  al¬ 
gorithm  [7]  for  matching  model  objects  with  m  features 
with  image  data  with  n  features,  we  have  to  test  on  the 
order  of  combinations,  where  m  and  n  are  easily 

on  the  order  of  several  hundreds  in  natural  pictures. 

On  the  other  hand,  approaches  in  the  second  category 
represent  objects  by  more  global  features.  One  method 
of  this  type  is  the  moment  invariant  method.  It  combines 
different  moments  to  represent  the  object,  and  matches 
the  object  model  and  image  data  in  moment  space[6,  19, 
1].  The  chosen  combinations  of  moments  are  designed 
so  that  they  are  invariant  to  the  image  transformations 
of  concern,  such  as  translations,  dilation,  and  rotations. 
Thus,  emphasis  is  mainly  placed  on  the  identification  of 
the  object  in  terms  of  the  object  model  represented  by 
the  combinations  of  the  moments,  rather  than  on  the 
recovery  of  the  transformation  between  the  model  and 
the  image  data. 

In  addition,  most  authors  have  not  addressed  the 
problem  of  general  afhne  transformation  case  (instead 
only  treating  translation,  dilation  and  scaling).  An  ex¬ 
ception  is  the  method  by  Cyganski  et.  al.[2]  based  on 
tensor  analysis.  They  developed  a  closed  form  method 
to  identify  a  planar  object  in  3D  space  and  to  recover 
the  afhne  transformation  which  yields  the  best  match  be¬ 
tween  the  image  data  and  the  transformed  model.  The 
basis  of  their  method  is  the  contraction  operation  of  the 
tensors[12,  9]  formed  by  the  products  of  the  contravari- 


ant  moment  tensors  of  the  image  with  a  covariant  per¬ 
mutation  tensor  that  produces  unit  rank  tensors.  Then, 
further  combining  those  with  zero-order  tensors  to  re¬ 
move  the  weight,  they  derived  linear  equations  for  the 
affine  parameters  sought  after.  This  method  is  quite  ele¬ 
gant,  but,  it  turns  out  that  it  needs  at  least  moments  up 
to  fourth  order.  In  general,  the  second  type  of  method 
is  very  efficient  when  compared  with  the  first  type  of 
method,  that  is,  methods  based  on  local  features  plus 
exhaustive  search.  At  the  same  time,  methods  based 
on  invariants  tend  to  be  very  sensitive  to  perturbations 
in  the  given  image  data.  For  example,  Cyganski’s  al¬ 
gorithm  is  known  to  be  very  efficient  computationally, 
however,  since  higher  order  moments  are  notorious  for 
their  sensitivity  to  noise[18],  it  is  very  fragile  when  it 
comes  to  perturbations  in  the  image  data,  being  partic¬ 
ularly  sensitive  to  local  occlusions  of  object  surfaces. 

The  algorithm  that  we  propose  in  this  paper  can  be 
classified  in  the  second  category  for  the  reason  given  be¬ 
low.  It  is  more  efficient  than  conventional  approaches 
in  the  first  category,  yet  more  stable  than  conventional 
methods  of  the  second  category:  (1)  it  relies  on  the  pres¬ 
ence  of  potentially  corresponding  image  fragments  over 
different  views,  that  are  from  planar  patches  on  the  sur¬ 
face  of  the  3D  objects,  (2)  it  provides  a  non-recursive, 
that  is,  closed-form,  method  for  object  recognition.  The 
method  does  not  require  complete  image  regions  to  be 
visible  and  does  not  depend  on  the  use  of  local  features 
such  as  edges  or  ‘corners.’  Our  method  also  recovers  the 
transformation  from  the  object  model  to  the  image  data, 
but,  unlike  Cyganski’s  method,  it  does  not  use  moments 
of  order  higher  than  second  or  third  order.  Therefore, 
compared  with  Cyganski’s  method,  it  should  be  less  sen¬ 
sitive  to  perturbations.  In  addition,  we  also  present  an¬ 
other  new  approach  to  robust  object  recognition  using 
differential  properties  of  the  image. 

Thus,  we  propose  two  different  algorithms:  one  based 
on  an  affine  invariant  unique  to  the  given  image,  which 
uses  up  to  second  or  third  order  moments  of  the  image, 
and  the  other  via  a  combination  of  second  order  statistics 
of  geometrical  and  differential  properties  of  the  image. 
Both  algorithms  recover  the  affine  parameters  relating  a 
given  2D  view  of  the  object  to  a  model  composed  of  pla¬ 
nar  surfaces  of  a  3D  object  under  the  assumption  of  or¬ 
thographic  projection[20,  10].  We  also  demonstrate  that 
such  methods  based  on  the  differential  properties  of  the 
image  are  fairly  stable  against  perturbations.  Of  course, 
the  results  are  not  perfect  in  the  presence  of  perturba¬ 
tions,  but  the  new  method  does  provide  much  better 
results  than  conventional  methods  using  global  features. 
Although  we  do  not  explicitly  address  the  problem  of 
how  to  extract  corresponding  regions  for  planar  patches 
in  different  views,  it  is  known  to  be  fairly  feasible  using 
one  of  several  existing  techniques  (e.g. ,[22,  21,  23,  13]). 
Once  we  have  recovered  the  affine  transformation  for  the 
planar  patches,  we  know  that  by  using  the  3D  object 
model  we  can  immediately  recover  the  full  3D  informa¬ 
tion  of  the  object  [7].  Therefore,  our  algorithm  is  aimed 
at  direct  3D  object  recognition,  by  first  recognizing  pla¬ 
nar  surfaces  on  the  object,  and  then  recovering  full  3D 
information,  although  the  recovery  of  3D  information  is 


Figure  1:  Commutative  Diagram  of  Transformations 
Given  model  feature  X  and  corresponding  data  feature  X' , 
we  seek  conditions  on  the  transformations  A,  A'  such  that 
this  diagram  commutes. 

not  explicitly  addressed  in  this  paper.  Some  experimen¬ 
tal  results  on  natural  pictures  demonstrate  the  effective¬ 
ness  of  our  algorithm.  We  also  give  here  an  analysis  of 
the  sensitivity  of  the  algorithm  to  perturbations  in  the 
given  image  data. 

2  Recovering  affine  parameters  via  an 
affine  invariant  plus  rotation 
invariant  using  no  higher  than 
second/third  order  moments 

In  this  section,  we  present  a  closed  form  solution  for  re¬ 
covering  the  afhne  parameters  with  which  a  given  image 
can  be  generated  from  the  model,  using  an  afhne  invari¬ 
ant  theory  that  we  have  recently  proposed.  We  hrst  sum¬ 
marize  the  afhne  invariant  description  (up  to  rotations) 
of  the  image  of  planar  surfaces.  Then,  using  this  prop¬ 
erty,  we  show  how  the  afhne  parameters  are  recovered 
via  direct  computation  in  conjunction  with  the  rotation 
invariant  using  moments  of  the  image. 

2.1  An  affine  invariant  np  to  rotations:  a 
nniqne  class  of  linear  transformations 

In  [15,  16,  17],  we  showed  that  there  exists  a  class  of 
transformations  of  the  image  of  a  planar  surface  which 
generates  unique  projections  of  it  up  to  rotations  in  the 
image  held.  It  was  precisely  shown  that  this  class  of 
transformations  is  the  only  class  of  linear  transforma¬ 
tions  which  provides  invariance  up  to  rotations,  as  long 
as  we  are  concerned  with  no  higher  than  second  order 
statistics  of  the  image(see  [15]).  This  property  is  sum¬ 
marized  in  the  following  theorem. 

[Theorem  ] 

Let  A  be  a  model  feature  position  and  X'  be  the  corre¬ 
sponding  data  feature  position  in  the  2D  held.  We  can 
relate  these  by 

A'  =  LX+oj  (1) 

where  T  is  a  2  x  2  matrix  and  w  is  a  2D  vector.  Now  sup¬ 
pose  both  features  are  subjected  to  similar  linear  trans¬ 
formations 


Y 

=  AX  +  B 

(2) 

Y' 

=  A'X'  +  B' 

(3) 

Y' 

=  TY  +  C 

(4) 

where  A,A',T  are  2x2  matrices  and  B,B',C  are  2D 
vectors.  Then,  if  we  limit  T  to  an  orthogonal  matrix,  a 
necessary  and  sufficient  condition  for  these  linear  trans¬ 
formations  to  commute  (i.e.  to  arrive  at  the  same  values 
for  Y')  for  all  A,  A'  (see  Figure  1),  as  long  as  only  up  to 
second  order  statistics  of  the  features  are  available,  is 

A  =  (5) 

A'  =  (6) 

where  $  and  $'  are  eigenvector  matrices  and  A  and  A' 
are  eigenvalue  matrices  of  the  covariance  matrices  of  A 
and  A'  respectively,  [/  and  [/'  are  arbitrary  orthogo¬ 
nal  matrices,  and  c  is  an  arbitrary  scalar  constant.  The 
terms  [-(a  denote  square  root  matrices [8]  and  [-(^  means 
matrix  transpose.  □ 

Furthermore,  it  was  shown[15]  that  when  (1)  repre¬ 
sents  the  motion  of  a  plane,  and  both  $  and  $'  rep¬ 
resent  rotations/rehections  simultaneously,  and  [/  and 
[/'  are  set  to  some  rotation  matrices,  then  T  in  (4)  can 
be  constrained  to  be  a  rotation  matrix.  As  another  as¬ 
pect  of  this  normalization  process,  we  know  that  trans¬ 
formations  A,  A'  dehned  in  (5)  and  (6)  transform  the 
respective  distributions  to  have  a  covariance  matrix  that 
is  the  identity  matrix.  Arguments  were  also  given  on 
the  physical  explanations  of  this  property  for  the  rigid 
object  case.  In  [15,  16,  17],  to  recover  the  affine  param¬ 
eters  using  this  property,  we  used  clustering  technique 
to  derive  three  potentially  corresponding  clusters  in  the 
model  and  data  2D  features  and  used  their  centroids  as 
matching  features  in  the  alignment  framework. 

In  this  section,  we  present  other  methods  to  directly 
recover  the  affine  parameters  using  this  invariant  prop¬ 
erty.  Recall  that  once  we  have  normalized  the  image 
using  the  transformations  given  in  (5),  (6),  the  shapes 
are  unique  up  to  rotations.  Thus,  if  we  can  compute  the 
rotation  matrix  T  in  (4)  which  relates  the  normalized 
data  image  from  the  normalized  model  we  can  recover 
the  affine  transformation  L  by 

L  =  A'-^TA  (7) 

where  the  translational  component  has  been  removed, 
using  the  centroid  coincidence  property[2,  15].  Note 
however,  that,  since  this  normalization  process  trans¬ 
forms  the  covariance  matrices  into  identity  matrices 
times  a  scale  factor,  the  covariances  can  no  longer  be 
used  to  compute  the  rotation  angle  between  the  normal¬ 
ized  model  and  data  features.  So,  we  need  to  use  some 
other  information  to  determine  this  rotation  angle. 

2.2  Computing  the  rotation  angle  using  second 
order  weighted  moments  of  the  image 

Although  the  binary  image  of  the  model  and  the  data 
are  normalized  by  the  matrices  A,  A'  so  that  they  have 
identity  covariance  matrices,  the  weighted  moments  of 
the  image  function  -  for  instance  brightness  of  the  im¬ 
age  -  are  not  normalized  in  that  sense.  Therefore,  we 
can  compute  the  rotation  angle  between  the  normalized 
binary  images  of  model  and  data  by  first  using  the  orien¬ 
tation  of  the  major  axes  of  the  image  computed  in  terms 
of  the  weighted  moments  with  respect  to  fixed  coordi¬ 
nates.  We  then  take  the  difference  between  the  absolute 


2 


(12) 


orientations  of  the  model  and  the  data  computed  in  this 
fashion  to  give  the  relative  rotation  angle. 


tan  20  = 


2Mi_i 

d^2,0  —  Mo  2 


(8) 


where  Mij’s  are  second  order  weighted  moments  of  the 
normalized  image  given  in  the  following: 


M2fi 

X  y 

(9) 

Mi.i 

X  y 

(10) 

A7o_2 

(11) 

X  y 


where  the  origins  of  the  normalized  coordinate  have  been 
centered  at  the  centroid  of  each  normalized  region,  0  is 
the  orientation  of  the  normalized  image,  and  f(x,  y)  is  an 
image  function  —  such  as  brightness  —  dehned  on  the 
normalized  coordinate.  For  the  ‘image’  function,  how¬ 
ever,  brightness  may  not  necessarily  be  the  best  choice. 
A  desirable  property  of  the  ‘image’  function  here  is  sta¬ 
bility  under  varying  ambient  light  conditions  and  the 
relative  orientation  of  the  object  surface  with  respect  to 
the  camera  and  the  light  source  in  3D  space.  From  the 
shape  of  the  formula  of  (8)  with  (9) — (11)  it  is  clear  that 
the  rotation  angle  thus  recovered  is  never  affected  by 
scale  change  of  the  image  function  between  the  model 
and  data  views.  Therefore,  the  property  we  need  here 
from  the  image  function  is  not  a  perfect  constancy,  but 
merely  a  constancy  within  a  scale  factor  under  different 
illumination  conditions.  This  is  not  a  hard  requirement 
in  practice  because  we  are  now  focusing  on  the  properties 
of  planar  surfaces.  For  example,  if  the  sensor  channels 
are  narrow  band  it  is  known  that  the  outputs  are  invari¬ 
ant  up  to  a  consistent  scale  factor  over  the  entire  sur- 
face(see  e.g.[14]).  By  equation(8),  we  get  two  different 
candidate  angles  (by  taking  the  direction  of  the  eigen¬ 
vector  with  the  larger  eigenvalue).  To  select  the  correct 
one,  we  can  align  the  given  image  data  with  the  recon¬ 
structed  image  from  the  model  using  the  recovered  afhne 
parameters  based  on  (7),  and  pick  the  one  that  gives  the 
best  match. 


2.3  Rotation  angle  via  Hn’s  moment 

invariants:  nsing  3rd  order  moments 

If  the  image  does  not  have  enough  texture,  or  if  it  is 
a  binary  image,  we  can  not  use  weighted  moments  of 
the  image  to  compute  the  rotation  angle  between  the 
normalized  image  data  and  the  model.  In  this  case, 
however,  we  can  use  the  third  order  moments  of  the  bi¬ 
nary  image.  The  use  of  higher  order  moments  for  in¬ 
variance  to  rotation  was  extensively  discussed  in  pattern 
recognition(e.g.[6,  19,  1]).  As  a  by-product  of  the  study 
of  invariance,  in  [6]  a  method  for  computing  the  rotation 
angle  using  higher  order  moments  was  also  presented, 
which  we  rewrite  here: 

z'o  =  (A^3.o  -  3tV(.2)  -  *'(3A^2.i  -  A^o.s) 

=  e'3''[(#3.o  -  3#i.2)  -  i{N2,i  -  3#o.3)] 


=  (tV^.o  +  A^m)  -  *'(^^2.1  +  A^o.3) 

=  e*®[(At3_o  +  tVi,2)  —  *(fV2,i  +  Ato,3)] 

=  e*''72i  (13) 

where  Npq  and  are  respective  third  order  moments 
of  the  normalized  binary  image  for  model  and  data  given 
in  the  following  (shown  only  for  the  normalized  model 
view)  and  Ipq ,  I'pq  are  the  complex  moment  invariants 
proposed  in  [6]. 


^3,0 

X  y 

(14) 

772,1 

X  y 

(15) 

77i,2 

X  y 

(16) 
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=  EEi!'”) 

(17) 
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where  the  sums  are  taken  over  the  entire  region  of  the 
binary  image  in  the  normalized  coordinate  in  which  the 
coordinate  origin  has  been  centered  at  the  centroid  of 
the  region. 

Thus,  we  have: 


tan  36*  =  1^3  j  63 

tan  6*  =  Vi/ 61 

(18) 

(19) 

where. 

V3  = 

(IV3.0  -  3tV[,2)(31V2.i  -  IV0.3) 
-(3#'_i  -  #^,3)(1V3.o  -  3#i.2) 

(20) 

63  = 

(#'_o  -  31V[.2)(1V3.o  -  3#i.2) 
+(3#'_1  -  #,(,3)(3#2.1  -  IV0.3) 

(21) 

Vl  = 

(7^3,0  +  77[_2)(772,i  +  7^0,3) 

~(772,i  +  77o_3)(773,o  +  ^1,2) 

(22) 

(51  = 

(773,0  +  77(_2)(773,o  +  77i,2) 

+(772,1  +  77o_3)(772,i  -f  77o,3) 

(23) 

An  aspect  to  which  we  must  pay  careful  attention  in  us¬ 
ing  Hu’s  moment  invariants  is  that  of  n-fold  rotational 
symmetry.  As  argued  in  [6],  moment  combinations  7pg’s 
with  the  factor  e*™®,  where  w/n  is  not  an  integer,  are 
identically  zero  if  the  shape  has  n-fold  rotational  symme- 

try,  so  that  we  can  not  use  those  moments  for  recovering 
the  rotation  angle  (see  [6]  for  detail).  For  example,  for 
the  4-fold  symmetry  case  such  as  a  square,  both  of  the 
formula  given  in  (18)  and  (19)  are  useless,  and  we  need 
higher  than  third  order  moments.  This  happens  if  the 
original  surface  shape  is  a  rectangle  when  viewed  from 
particular  direction  in  3D  space  (of  course  including  the 
frontal  direction).  This  is  because  if  some  image  of  the 
surface  can  be  a  rectangle,  then,  no  matter  from  what¬ 
ever  direction  it  is  viewed,  its  normalized  binary  image 
becomes  a  square,  and  hence  has  4-fold  symmetry.  This 
is  the  consequence  of  the  normalization  process  we  are 
using(see  [15]  for  detail).  We  will  see  this  case  in  the 
experiment  soon. 


2.4  Results  in  using  invariants  on  natural 
pictures 

We  now  show  experimental  results  obtained  using  the 
proposed  algorithm  for  recovering  affine  parameters 
based  on  affine  invariants  of  natural  pictures.  All  the  pic¬ 
tures  shown  here  were  taken  under  natural  light  condi¬ 
tions.  The  image  regions,  which  are  from  planar  patches 
on  the  object  surfaces,  were  extracted  manually  with 
some  care,  but  some  perturbations  may  be  introduced 
by  this  step  of  the  procedure.  Figure  2  shows  the  results 
on  images  of  a  Cocoa-Box.  The  upper  row  of  pictures 
show  the  two  gray  level  pictures,  of  the  same  Cocoa-Box 
taken  from  different  view  points:  the  left  view  was  used 
for  the  model,  while  the  right  was  used  for  the  data. The 
left  and  right  Rgures  in  the  middle  row  show  the  respec¬ 
tive  normalized  images  up  to  a  rotation.  Indeed,  we  see 
that  the  two  figures  coincide  if  we  rotate  the  left  fig¬ 
ure  by  180  degrees  around  its  centroid.  The  left  and 
right  figures  in  the  lower  row  are  the  respective  recon¬ 
structed  image  data  from  the  model  view  (shown  in  the 
upper  left)  by  the  recovered  affine  transformation  using, 
lower  left:  affine  invariant  plus  second  order  weighted 
moments  of  the  gray  level,  lower  right:  third  order  mo¬ 
ments  of  the  binary  image  for  computing  the  rotation 
angle.  If  the  method  works  correctly,  then  those  recon¬ 
structed  images  should  coincide  with  the  corresponding 
image  portion  found  in  the  upper  right  figure.  Indeed, 
we  see  that  both  of  the  methods  worked  very  well  for 
recovering  the  transformation  parameters. 

In  Figure  3  the  results  are  shown  for  pictures  of  a 
Baby-Wipe  container.  The  upper  row  of  pictures  shows 
the  source  gray  level  pictures  of  a  Baby-Wipe  container 
of  which  the  front  part  was  used  for  the  experiment:  the 
left  view  was  used  for  the  model,  the  right  view  was  used 
for  the  data.  The  left  and  right  figures  in  the  middle  row 
show  the  respective  normalized  images.  The  lower  fig¬ 
ure  is  the  reconstructed  image  data  from  the  model  view 
using  the  affine  transformation  recovered  by  means  of 
affine  invariant  plus  second  order  weighted  moments  for 
computing  the  rotation  angle.  We  would  expect  that  the 
reconstructed  image  coincides  well  with  the  image  in  the 
upper  right.  From  the  figure,  we  see  that  this  method, 
i.e.,  affine  invariant  plus  second  order  weighted  moments 
worked  very  well  for  recovering  the  parameters.  As  ob¬ 
served  in  the  figures,  the  normalized  images  are  almost 
4-fold  rotationally  symmetric,  so  that  —  as  described 
previously  —  we  can  not  use  the  third  order  moments 
of  the  normalized  binary  image  to  recover  the  rotation 
angle. 

Figure  4  shows  the  results  on  some  Tea-Box  pictures. 
The  upper  row  shows  the  pictures  of  a  Tea-Box:  the  left 
view  was  used  for  the  model,  while  the  right  view  was 
used  for  the  data.  The  left  and  right  figures  in  the  middle 
row  are  the  respective  normalized  images  up  to  a  rota¬ 
tion.  The  left  and  right  figures  in  the  lower  row  show  the 
respective  reconstructed  image  data  from  the  model  view 
using  the  recovered  affine  transformation  based  on  affine 
invariant  plus  second  order  weighted  moments  of  the 
gray  level  (left)  and  third  order  moments  of  the  binary 
image  (right)  for  recovering  the  rotation  angle.  From  the 
figure,  we  see  that  both  of  the  reconstructed  images  coin¬ 


cide  well  with  the  original  data  shown  in  the  upper  right. 
Though  both  the  methods  worked  fairly  well,  the  method 
using  second  order  weighted  moments  performed  slightly 
better.  Considering  that  both  of  the  reconstructed  im¬ 
ages  are  tilted  a  little  bit  in  a  similar  manner,  perhaps 
some  errors  were  introduced  in  the  manual  region  ex¬ 
traction. 

3  A  sensitivity  analysis  in  the  use  of 
affine  plus  rotation  invariant 

In  this  section  we  analyze  the  sensitivity  of  the  pro¬ 
posed  algorithm  for  recovering  affine  transformations  us¬ 
ing  affine  invariant  plus  second  order  weighted  moments 
of  the  image  function  to  perturbations  in  the  image  data. 
Perturbations  are  caused,  for  example,  by  errors  in  re¬ 
gion  extractings,  by  lack  of  planarity  of  the  object  sur¬ 
face,  or  by  occlusions.  From  (7),  we  know  that  the  sensi¬ 
tivity  of  the  recovered  affine  parameters  against  pertur¬ 
bations  solely  depends  on  the  stability  of  A' ,  the  matrix 
normalizing  the  given  binary  image,  and  T,  the  rota¬ 
tion  matrix  relating  the  normalized  model  and  the  data 
views,  as  we  assume  that  the  model,  so  that  A,  does  not 
include  any  perturbations.  As  described  in  (2.1),  the 
transformation  A'  can  be  computed  solely  using  eigen¬ 
values  and  eigenvectors  of  the  covariance  matrix  of  the 
original  binary  image,  i.e.,  the  set  of  {x,y)  coordinates 
contained  in  the  image  region.  Therefore,  if  the  given 
image  contains  perturbations,  these  have  effects  on  the 
matrix  A' ,  but  only  through  the  covariances.  In  other 
words,  the  errors  in  A'  can  be  completely  described  by 
the  perturbations  expressed  in  terms  of  covariances.  On 
the  other  hand,  the  effect  of  the  perturbations  on  the 
recovered  rotation  matrix  differs  according  to  which  al¬ 
gorithm  we  take  for  computing  rotation,  namely,  the 
weighted  moments  of  the  image  attributes,  or  the  third 
order  moments  of  the  binary  image  of  the  objects.  In 
this  section,  we  only  show  the  case  for  second  order 
weighted  moments  of  the  image  attributes.  The  per¬ 
turbation  analysis  of  the  algorithm  based  on  third  order 
moment  may  be  presented  in  a  subsequent  paper. 

3.1  Analytical  formula  for  sensitivity 

In  the  following,  we  derive  the  sensitivity  formulas  for 
the  affine  parameters  to  be  recovered,  given  perturba¬ 
tions  in  the  image  data  with  respect  to  the  model.  Let 
the  ideal  description(without  any  errors)  for  the  normal¬ 
ization  process  be  presented  as: 

A'LA-^  =  f  (24) 

and  the  affine  parameters  are  recovered  by(c.f.(7)): 

L  =  A'~^fA  (25) 

Throughout  the  subsequent  parts  of  the  paper,  we  con¬ 
sistently  use  the  notation  [  '  ]  (tilde)  for  ideal  parameter 
values  and  one  without  tilde  for  actually  observed  val¬ 
ues,  unless  otherwise  stated.  Then,  the  perturbations 
AL  happening  on  L  is  given  as  follows: 

-AT  =  (A'-^T  -  A'~^f)A 

=  {{A' -  AA'y^f  -  AT)  -  A'~^f}A 


=  {i'  -  AT)- A'  ^f}A 

OO 

=  [A'~^ A'~^)^]{f  -  AT)  -  A'~^f]A 

k  =  0 

=  A'~\AA'A'~^f  -  AT)A  +  0(A^)  (26) 

where  —AA'  and  —AT  are  respective  perturbations  of 
A'  and  T  such  that  -AA'  =  A'  -  A' ,  -AT  =  T  -f. 
The  minus  signs  for  the  perturbations  are  for  consistency 
with  the  perturbation  of  the  covariances  which  will  ap¬ 
pear  soon.  Thus,  ignoring  the  higher  than  Rrst  order 
terms,  we  now  know  that  our  job  is  to  derive  formulas 
for  AT  and  AA'  in  terms  of  perturbations  contained  in 
the  image  data. 


[Perturbations  in  A’] 

As  observed  in  (6),  A'  is  a  function  of  eigenvalues  A^’s 
and  eigenvectors  $r’s  of  the  covariance  matrix  E'  such 

that  AE(A,$)  =  X-  where  A^  is  the  rth  eigenvalue 

and  is  the  sth  component  of  the  corresponding  rth 
eigenvector  Let  A'ij  be  the  ideal  value  for  A'-,  the 
ij  component  of  the  matrix  A'.  Then,  we  get  a  formula 
for  the  perturbations  AA'-  from  the  Taylor  expansion  of 
A'  in  terms  of  A  and  $  as  follows: 


-AAL  =  AL-A'ij 


=  AL  (Xi  -  AXi ,  -  A$ji)  -  AL  (A,- , 

dA'fj  dA'fj 

=  A-!(lAA,«y  _A,A«i,)  +  0(A")  (27) 

where  perturbations  of  the  eigen  properties  are  defined 
as  -AAi  =  Xi  -  Xi, 

-A$i  = 


Here,  from  perturbation  theory[S\,  we  have: 

T 

\yv(S>,, 

1$, 


VI 


'  Xk-Xi 
-AXk  =  -$/AE'$fc 


(28) 

(29) 


where  (k,  1)  E  {(1,  2),  (2,  1)}  and  —AY,'  is  the  perturba¬ 
tion  of  the  given  covariances  such  that  —AY'  =  Y'  —  Y' . 
The  minus  sign  of  the  perturbation  of  covariances  ac¬ 
counts  for  the  occlusions  (being  occluded  by  some  other 
surface)  occurring  in  the  given  image  data.  Substituting 
(28)  into  (27),  we  obtain: 


-AA'n  = 


-AA'i2  = 


--|$2  AE'$i^- 

-Al  - - ^21 

Al  —  A2 

AA'i“^(®AAE'$i)$n  (30) 

-Al  - ::: - - ^22 

A1-A2 

-iA'r^($i^AE'$i)$2i  (31) 


-AA'21  = 


-A- 


-|$i  AE'$2 


$ 


11 


-AA'22  = 


-  —  '^2  ^  ($2  AE'$2)‘Li2  (32) 

-  -|$1^AE'$2^- 

-A2  - - ^12 

A2-A1 

-  —  '^2  ^  ($2  AE'$2)‘L22  (33) 


The  equations  (30) — (33)  give  the  first  order  approxi¬ 
mation  of  the  perturbation  AAL  for  A'- ,  that  is  a  lin¬ 
ear  combination  of  the  perturbation  AY' pq  such  that 
AAL  =  '^pq  Tpq^'^' Pi  where  Tpq  are  coefficients  that  are 
composed  of  the  eigen  properties  of  the  covariances  ma¬ 
trix  of  the  ideal  image  data,  that  are  uniquely  determined 
by  (30) — (33)  and  are  independent  of  the  perturbations. 


[Perturbations  in  T:  the  rotation  matrix] 

In  deriving  an  analytical  formula  for  the  perturbation 
AT,  we  rely  on  the  formula  given  in  (8) — (H),  relat¬ 
ing  the  rotation  angle  to  the  second  order  weighted  mo¬ 
ments  of  the  image  (as  we  have  fixed  the  orientation  of 
the  model,  orientation  of  the  given  image  can  be  seen  to 
be  equivalent  to  the  rotation  angle).  Further,  we  have 
the  following  relation  between  the  weighted  moments  of 
the  original  and  the  normalized  images. 

[M']  =  A'[m']A'^  (34) 

[M']  =  A'[m']A'^  (35) 

=  (A'  -  AA')([m']  -  [Am'])(A'  -  AA')^(36) 

where  [m'],[M']  are  respective  symmetric  matrices  of 
original  and  transformed  weighted  moments  defined  in 
the  following  and  the  term  [Am']  is  the  matrix  of  the  per¬ 
turbation  contained  in  the  original  image  data  in  terms 
of  the  weighted  moments: 


[m']  = 

[  "*20 
V  Ki 

mil  ] 

"*02  J 

(37) 

[M']  = 

f  ^20 

^  Mii 

M[q  \ 

M'o2  j 

(38) 

—Am']  = 

[m']  -  1 

m'] 

(39) 

Let  9,  9  be  the  recovered  and  ideal  rotation  angle  and 
—A9  be  the  corresponding  perturbation,  where  we  as¬ 
sume  that  A9  is  small  enough  such  that: 


-AT 


cos(6l  —  A9)  —  sin(6l  —  A9)  A 

sin(^  —  Al?)  cos(^  —  Al?)  J 

'  cos(^)  —  sin(^)  A 
^  sin(^)  cos(^)  J 


Al?sin(l?)  Al?cos(l?) 
— Al?cos(^)  Al?sin(^) 


-bO(A2) 


In  the  following,  we  derive  a  formula  for  perturbation 
Al?  in  using  second  weighted  moments  of  the  image.  As 
we  assumed  that  — Al?  =  1?  —  1?  is  small  enough,  we  can 
approximate  it  as: 


=  ^(2^?-2^') 


A2  —  Al 
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-A9 


K>  -  tan(26l  —  2d) 

1  tan(26l)  —  tan(2^) 

2  1  +  tan(26l)  tan(2^) 

1  tan(26l)  —  tan(2^) 

2  '  1  +  {tan(2^)}2 

Substituting  the  relation  presented  in  (8),  we  get: 


1  +  {tan(2^)}2 


(41) 


1 

1  +  {tan(2^)}2 

1 


MU 


M^fi  Mq2 

1 

{m'u  -  m\u^ 

- = - j 


(M'_o-M'_2)2  +  (2M(i)2 


(42) 

J(43) 

(44) 


where 

J  =  M[,{M^,-MU)-M[i{M'^o-MU)  (45) 
Substituting  (37) — (39)  into  (45)  we  get: 

J  =  eii^A'ii  +  ei2^A'i2  +  e21^4l2i  +  622^4122 

+/2oAm2o  +  +  /o2AmQ2  +  O(A^)  (46) 


where  e^’s,  /p^’s  are  respective  coefficients  of  Aj4(-  and 
ArUp^  that  are  composed  of  the  components  A[j  and 

m'rs  that  are  independent  of  the  perturbations  involved 
in  the  given  image  data. 

Then,  combining  (40),  (44),  and  (46),  we  get  AT. 


[Perturbation  in  L] 

Finally,  combining  the  formulas  for  AT  thus  derived  with 
(30) — (33)  and  substituting  it  into  (26),  we  obtain  the 
perturbation  AT: 

-AT, 2  ~  ^(CiAE;,+4iAm;j  (47) 

r,s 

where  ^,k  are  coefficients  that  are  exclusively  com¬ 
posed  of  the  components  A'-,  m'rs  and  Aij ,  that 
are  independent  of  the  perturbations,  and  (r,  s)  G 
{(2,  0),  (1,  1),  (0,  2)}.  By  this  linear  combination  of  AUs 
and  Am).,, ,  we  have  obtained  the  Rrst  order  approxima¬ 
tion  of  ALij ,  the  perturbation  of  T,y ,  given  the  pertur¬ 
bations  in  the  original  image  data  in  terms  of  the  sec¬ 
ond  order  moments  of  the  binary  image  (AS),,),  and  the 
second  order  weighted  moments  of  the  image  attributes 
{Am'r,). 

3.2  Experiments 

Now  we  show  the  sensitivity  of  the  proposed  algorithm 
for  recovering  affine  parameters  based  on  affine  invariant 
plus  second  order  weighted  moments  to  perturbations  of 
the  given  image  region.  From  (47),  we  know  that  per¬ 
turbation  of  each  recovered  component  T,y  is  the  linear 
combination  of  perturbations  of  moments  of  the  given 
image.  Here,  for  simplicity,  we  try  to  capture  the  overall 


trend  of  the  sensitivity  of  T  to  perturbations  in  the  given 
data  by  examining  the  following  formulas: 


against: 


6  = 


\ 


E 


AT2. 


Uij  Lij 


+  (Am',)n 


(48) 


(49) 


where  /  is  a  balancing  parameter,  and  in  the  following 
experiments  we  set  it  to  255/2.  The  terms  6,  express 
respectively  the  normalized  errors  of  the  recovered  affine 
parameters  and  the  normalized  perturbations  in  terms 
of  the  moments  of  the  image.  We  expect  that  those  two 
formulas  show  monotonic  relations  when  perturbations 
in  the  moments  are  small.  Of  course,  we  know  from  the 
above  arguments  that  there  will  be  some  complicated 
interactions  between  the  two,  but  we  hope  some  insight 
may  be  obtained  by  observing  those  two  formulas.  We 
use  the  same  picture  of  a  Cocoa-Box  used  in  the  earlier 
experiments.  To  study  the  effects  of  occlusion,  pertur¬ 
bations  in  the  image  data  were  produced  by  dropping 
particular  connected  regions  from  the  (almost)  perfect 
image  data,  as  given  in  Figures  5.  The  upper  pictures 
show  examples  of  the  perturbed  image  data  for  which 
some  percentage  of  the  image  region  was  dropped:  left 
5%,  middle  15%,  right  25%.  The  lower  pictures  show 
the  respective  reconstructed  image  data.  Figure  6  shows 
6  (vertical  axis)  versus  (horizontal  axis),  in  which 
the  perturbations  were  taken  from  2.5%  to  25%  by  2.5% 
step.  From  the  figure,  we  see  that  6,  accuracy  in  recov¬ 
ering  affine  parameters,  is  almost  proportional  to  ,  the 
perturbations,  when  it  is  small,  but  the  slope  increases 
a  lot  as  (t2  increases. 


4  Using  differential  properties  of  the 
image:  without  invariants 

In  this  section,  we  derive  another  constraint  equation  on 
affine  parameters  based  on  the  differential  properties  of 
the  image,  and  combine  it  with  the  canonical  geometrical 
constraint  given  in  (1)  to  recover  the  affine  transforma¬ 
tion.  We  rewrite  here  the  geometric  constraint  on  the 
motion  of  planar  surfaces  for  convenience: 

X'  =  LX  (50) 

where  the  translational  component  has  been  eliminated 
(based  on  the  invariance  of  the  region  centroids).  Deriv¬ 
ing  the  covariance  matrices  on  both  sides  of  (50)  gives: 

Ex'=TExT^.  (51) 

where  indices  of  the  covariances  Ex' ,  Ex  show  the  corre¬ 
sponding  distributions.  Due  to  the  symmetry  of  covari¬ 
ance  matrix,  we  have  only  three  independent  equations 
in  (51)  for  four  unknowns  that  are  the  components  of 
T.  Therefore,  we  apparently  need  another  constraints 
to  solve  for  T.  (comments:  The  constraint  of  ratio  of 
the  image  area  det[L]  =  AREA(A')/AREA(A)  is  re¬ 
dundant  here  when  one  employs  (51).)  From  this  point 


of  view,  what  we  have  done  in  the  preceding  sections  can 
be  seen  as  imposing  constraints  of  the  rotations  between 
the  normalized  (up  to  rotations)  images  either  in  terms 
of  weighted  moments  of  some  image  attribute  or  using 
third  order  moments  of  the  binary  image.  Here,  we  will 
seek  another  constraint  which  does  not  use  invariants, 
based  on  the  differential  property  of  the  image  which  is 
related  to  the  underlying  geometry  of  the  image. 

4.1  Deriving  another  constraint  based  on 
differential  properties  of  the  image 

To  derive  another  constraint  on  afhne  parameters,  sup¬ 
pose  that  we  have  an  image  attribute  E{X)  —  some 
scalar  function  of  position  X  in  the  image  held  —  that 
is  related  to  E'(X')  of  the  corresponding  point  X'  in 
another  view  by: 

E(X)  =  -E'(X')  (52) 

P 

where  X  and  X'  are  related  by  (50)  and  p  is  a  scalar 
constant.  This  represents  a  constraint  that  the  changes 
of  the  function  E  between  the  different  views  are  only 
within  a  scale  factor  that  is  consistent  over  the  specihed 
region.  Again,  we  can  claim,  as  in  the  previous  discus¬ 
sion  of  2.2,  that  this  constraint  is  a  fairly  reasonable  one. 
Taking  the  gradient  of  both  sides  of  (52), 

{E.,Eyf  =  -J^{E',,E'^f  (53) 

P 

where  Aj’s  denote  partial  derivatives  of  E  in  terms  of 
the  variable  s,  and  J  is  the  Jacobian  of  X'  in  terms  of 

/  5^  5^  \ 

£  &  (■«' 

\  dx  dy  ) 

L  (55) 

we  get  a  similar  constraint  to  that  on  the  geometry  given 
in  (50),  in  the  differential  image,  that  includes  the  same 
afhne  parameters  L\ 

U=-L^U'  (56) 

P 

where  U  =  {Ex,Ey)'^  and  U'  =  {E'^,Ey)'^ .  Taking  the 
covariances  brings  another  constraint  on  afhne  parame¬ 
ters  in  terms  of  the  second  order  statistics  of  the  differ¬ 
ential  image  as  follows: 

Eu  =  (57) 

Thus,  we  have  obtained  two  constraint  equations  in 

(51), (57)  on  afhne  parameters  which  are  composed  of 
up  to  second  order  statistics  of  the  geometry  and  the 
differential  properties  of  the  image. 

4.2  Solving  for  the  matrix  L 

We  show  how  we  can  solve  for  the  afhne  transformation 
L,  combining  the  constraints  of  the  geometry  and  the  dif¬ 
ferential  properties  of  the  image.  We  anticipate  that  in 


X  such  that 

J  = 


practice,  due  to  the  limited  dynamic  range  of  the  sensor 
device  as  well  as  its  spatial  resolution,  the  geometrical 
constraint  would  probably  be  more  reliable  than  the  dif¬ 
ferential  constraints.  Therefore,  we  incorporate  all  the 
three  geometrical  equations  given  in  (51)  with  one  of  the 
three  differential  constraints  given  in  (57)  to  get  a  solu¬ 
tion  for  L.  But,  for  the  purpose  of  stability,  we  will  try 
all  the  possible  combinations  of  the  set  of  the  three  from 
(51)  with  every  one  of  (57),  and  choose  the  best-Rt  match 
in  terms  of  the  alignment  of  the  model  with  the  image 
data,  just  as  in  the  case  of  using  the  afhne  invariant. 

Combining  (51)  and  (57)  we  immediately  get: 


P  = 


det  [Sx']dct[S[/'] 
det\E  x\det\Eu'\ 


(58) 


Since  covariance  matrices  are  positive  definite  and  sym¬ 
metric,  it  is  not  hard  to  see  from  equation  (51)  that  L 
can  be  written  as: 

T  =  (59) 

1 

where  are  respective  positive  definite  symmet¬ 

ric  square  root  matrices  of  Ex' ,  Ex ,  that  are  unique [8], 
and  Q  is  an  orthogonal  matrix,  accounting  for  the  re¬ 
maining  one  degree  of  freedom.  Considering  the  fact 

that  0  <  det[L]  =  det[Ejr,]det[Q]det[Ex^]  we  know  that 
Q  must  be  a  rotation  matrix,  so  that  Q  may  be  written 
as: 


Q 


COS  U  —  sin 
sin  0  cos  6 


(60) 


thus  we  have: 

eii  cos  6  +  fii  sin  6  ei2  cos  6  +  /12  sin  6 
621  cosd -b /21  sind  622  cos  d -b /22  sin  d 

where  the  coefficients  6^ ,  fij  are  composed  of  the  ele- 
1  1 

ments  of  Ejr,  and  Ex^  and  those  are  uniquely  deter¬ 
mined  by  (59).  Substituting  (61)  into  each  of  the  two 
equations  (we  have  already  used  one  for  solving  for  p)  in 
(57)  yields: 

kij  (cos  6)^  +  2lij  (cos  d)(sin  6)  +  rriij  (sin  6)^  =  p'^pij  (62) 
where  kij ,  ,  rriij  are  the  respective  coefficient  of 

(cosd)^,  (cos  d)(sin  d),  and  (cos0)^  in  the  ij  components 
of  the  resulting  matrices  in  the  left  hand  side,  that  are 
composed  of  coefficient  epq’s,frs’s,  and  elements  of  E[//, 
and  pij  in  the  right  hand  side.  Solving  for  equation  (62) 
we  get: 

cos  9  =  ±1/^  (63) 

V 

sind  =  —  (64) 

V  X 

where, 

i^c  =  2/^  -b  (m  —  p‘^p){m  —  k) 

±\/AP{P  —  {p'^p  —  m)(p‘^p  —  k)) 

6c  =  (m  —  k)^  +  AP 
Vs  =  2P  +  {k  —  p^p){k  —  m) 

±\/AP{P  —  {p'^p  —  k)(p‘^p  —  m)) 

6s  =  {k  —  m)^  +  AP 


(65) 

(66) 
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(67) 

(68) 


where  indices  have  been  suppressed  for  simplicity.  By 
substituting  this  back  into  (61),  we  Rnally  obtain  the 
four  possible  candidate  of  L.  To  select  the  best  one  out 
of  this  candidate  set,  we  will  try  out  all  the  candidates 
using  the  alignment  approach  and  pick  the  one  that  fits 
best. 

The  advantage  of  using  gradient  distributions  of  the 
image  functions,  compared  with  using  only  geometri¬ 
cal  properties,  is  that  their  covariances  may  not  be  as 
strongly  disturbed  by  local  missing  regions  or  occlusions. 
Actually,  we  show  below  a  demonstration  of  this  using 
experiments  on  natural  images.  In  this  section  we  de¬ 
scribed  a  method  that  combines  differential  and  geomet¬ 
rical  properties  of  the  image,  but  we  might  be  able  to 
derive  a  different  method  for  recovering  the  affine  pa¬ 
rameters  if  we  had  more  than  one  reliable  image  at¬ 
tributes.  By  combining  those  two  image  constraints, 
instead  of  incorporating  geometry,  we  may  be  able  to 
evelop  a  method  that  would  be  less  affected  by  missing 
regions. 

Since  the  major  disadvantages  of  the  use  of  global 
features  such  as  moments  is  the  apparent  senisitivity  to 
local  disturbances,  this  approach  —  that  is,  the  use  of 
differential  properties  —  could  be  a  key  issue  for  improv¬ 
ing  the  stability  of  the  algorithms.  In  the  Appendix  we 
also  show  —  at  least  mathematically  —  that  even  a  sin¬ 
gle  point  correspondence  between  the  model  and  data 
2D  views  suffice  to  recover  affine  parameters,  if  some  in¬ 
variant  image  function  is  available  under  the  change  of 
orientation  of  the  surface. 

[Summary] 

In  this  section  so  far,  we  have  mathematically  derived 
a  constraint  equation  on  affine  parameters  based  on  the 
differential  properties  of  the  image  in  terms  of  its  second 
order  statistics.  Then,  combining  this  constraint  with 
the  canonical  geometric  constraint  —  again  in  terms  of 
second  order  statistics  —  we  shown  how  we  can  solve  for 
the  affine  parameters  by  a  direct  computation. 

4.3  Results  using  differential  properties  on 
natural  pictures 

Results  using  the  algorithm  via  combination  of  the  ge¬ 
ometrical  and  differential  properties  of  the  image  are 
shown  on  the  same  natural  pictures  used  in  the  earlier 
experiments  for  the  method  based  on  affine  invariants. 
We  used  the  gradient  of  the  gray  level  (brightness)  im¬ 
age  function  for  the  differential  data.  Note  that  even 
though  the  picture  given  in  the  following  shows  only  the 
data  for  the  manually  extracted  region  used  for  recogni¬ 
tion,  we  actually  use  the  original  image  when  calculating 
the  gradient  at  each  point.  As  a  result,  the  artificially 
introduced  edges  of  the  extracted  region  do  not  have 
any  effect  on  the  derivation  of  the  gradient  distribution. 
Note  that  this  is  very  important  in  demonstrating  the  ef¬ 
fectiveness  of  our  method,  because  otherwise  larger  con¬ 
tributions  on  the  covariances  of  gradient  distributions 
would  be  made  by  the  artificially  constructed  edges. 

Figure  7  shows  the  results  on  the  Cocoa-Box  pictures. 
The  left  and  right  figures  in  the  upper  row  show  the  re¬ 
spective  gradient  distribution  —  the  horizontal  axis  is 


and  the  vertical  axis  is  fy  —  for  the  model  and  the  data 
views.  The  lower  figure  shows  the  reconstructed  image 
data  from  the  model  view  by  the  affine  transformation 
that  was  recovered.  We  expect  this  figure  to  coincide 
with  the  corresponding  portion  of  the  upper  right  picture 
in  Figure  2.  From  the  figure,  we  see  that  the  algorithm 
performed  almost  perfectly. 

In  Figure  8  the  results  on  the  Baby- Wipe  container 
pictures  are  given.  The  left  and  right  figures  in  the  up¬ 
per  row  show  the  respective  gradient  distribution  for  the 
model  and  the  data  view.  The  lower  figure  is  the  recon¬ 
structed  image  data.  We  expect  this  to  coincide  with  the 
corresponding  portion  of  the  upper  right  picture  of  the 
Figure  3.  The  accuracy  is  again  fairly  good,  although 
not  as  good  as  that  obtained  by  affine  invariant  plus  sec¬ 
ond  order  weighted  moments.  Likewise,  Figure  9  shows 
the  results  on  the  Tea-Box  pictures.  The  result  is  almost 
as  good  as  that  obtained  using  affine  invariant. 

In  Figure  10,  we  show  the  reconstructed  image  data 
given  the  perturbation  in  the  original  image.  We  used 
the  same  data  as  that  used  in  the  sensitivity  tests  for  the 
affine  invariant  method.  The  figures  show  the  respective 
results  for  the  fraction  of  missing  region  5%(left),  15%, 
25%.  In  Figure  11,  the  values  of  6  (vertical  axis),  accu¬ 
racy  in  recovering  affine  parameters,  are  plotted  against 
the  percentage  of  the  missing  region  (horizontal  axis)  in 
the  given  image  data.  We  compared  this  results  with 
the  one  obtained  by  the  affine  invariant  method  pre¬ 
sented  previously.  Apparently,  the  results  by  differential 
method  (plotted  as  blocks)  are  less  sensitive  to  pertur¬ 
bations  than  those  by  obtained  by  the  affine  invariant 
method  (plotted  as  stars).  Probably,  this  is  due  to  the 
use  of  differential  distribution  as  described  previously. 

5  Conclusion 

In  this  paper,  we  proposed  new  algorithms  for  3D  ob¬ 
ject  recognition  that  provide  closed-form  solutions  for 
recovering  the  transformations  relating  the  model  to  the 
image.  We  proposed  two  different  algorithms:  The  first 
one  is  based  on  the  affine  plus  rotation  invariants  using 
no  higher  than  second  or  third  order  moments  of  the  im¬ 
age.  Some  results  on  natural  pictures  demonstrated  the 
effectiveness  of  the  proposed  algorithm.  An  error  analy¬ 
sis  was  also  given  to  study  the  sensitivity  of  the  algorithm 
to  perturbations.  The  second  algorithm  used  differential 
properties  of  the  image  attribute.  Results  demonstrated 
that  the  use  of  differential  properties  of  image  attributes 
allows  a  recovery  of  the  parameters  that  is  insensitive 
to  missing  regions  in  the  given  image.  This  suggested  a 
new  direction  of  object  recognition  in  the  sense  that  it 
may  provide  a  robust  technique  using  global  features  for 
recovering  transformations  relating  the  model  to  the  im¬ 
age.  Differential  properties  have  been  extensively  used 
in  motion  analysis(e.g.,[5]),  but  limited  to  infinitesimal 
motions  of  the  object.  In  contrast  to  the  case  of  motion 
analysis,  our  case  is  not  limited  to  infinitesimal  motion. 
The  new  method  can  deal  with  any  motion  of  the  planar 
surface,  as  long  as  the  change  of  the  image  attribute  is 
constrained  within  a  scale  factor  at  each  position  on  the 
object.  Though  all  the  demonstrations  were  only  on  pla¬ 
nar  patches,  as  we  described,  it  can  connect  with  the  full 


3D  model  of  the  object  to  recover  the  full  3D  information 
via  direct  computation. 
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Appendix:  Recovering  affine  parameters 
via  single  point  correspondence 


In  this  appendix  we  give  theoretical  arguments  showing 
that  even  a  single  point  correspondence  between  two  dif¬ 
ferent  views  suffices  to  recover  the  affine  parameters  by 
using  differential  properties  of  the  image.  To  do  this  we 
assume  that  we  have  a  nice  image  attribute  (function)  I 
which  has  the  perfect  invariant  property  between  differ¬ 
ent  views  such  that:  I(X)  =  I'(X')  and  I  ^  C^  where 
A'  =  LX. 

(Comments:  This  complete  invariance  assumption  may 
seem  to  be  unrealistic  in  practice.  But,  again,  as  argued 
in  [14]  when  the  ambient  light  is  not  changed  it  is  known 
that  the  ratios  of  the  sensor  outputs  of  different  channels 
are  invariant  if  the  sensors  are  narrow  band.) 

Taking  the  gradient  of  I  we  have: 


Ix  —  Lllljl  +  L2llylly  —  Ll^l'x!  +  L^^l'y!  (69) 

Deriving  the  second  order  derivatives,  we  have: 

Ixx  =  Llilj,„.,  +  ‘ILiiL’ziI'xiy'  +  ^lilyiyi  (70) 


Ixy  —  LiiLi2l'j.ij.i  +  (L11L22  +  Li2L2l)l'j;iyi 

-\-L21L22Iyiyi  (71) 

lyy  =  L\2I'x'x'  +‘^Ll‘2L22l'x'y'  +  L22ly'y'  (72) 

From  (69)  we  get  L21  =  {Ix  —  Lul'^i) / ly ,  and  substitut¬ 
ing  this  to  (70)  and  rearranging  we  have 

{^'x'x'^y'  -  ‘^I'x’Iy’l'xy  +  ly’y’l'x’  )Lil  -  ‘^{I'y' I'x' y'  +  I'x' I'y' y')Ix  Ln  +  ll  I'yi  yi  -  I'yi  Ix 

Likewise,  we  have  a  similar  equation  for  L22(and  Li2)- 
Then,  solving  for  these  quadratic  equations  we  obtain: 

I’x'x'Iy"  -K'ly'I'xy+Iy'y'Ix" 

-^21  —  - o - o - 

i',.,J'y.^-2r,.ry.i',y  +  ry.y.rJ 
i-ly'K'y'  +  K'ly'y')h  T  {I'y  /  l'x')K'\^2 

-^21  —  - o - o - 

{-l',,l',,y,+l'y,l',,X,  )ly±\l'x,\^2 

Lii  —  - - :x  (Ml 

T'  T'  ^  —  ‘)T'  T'  J!  -X  T'  T'  ^  ' 

^y'y'^x'  ‘^^y'^x'^xy  ^  ^x'x'^y' 

where 


0  (73) 
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Figure  2:  Results  by  affine  invariant  method  on  the  Cocoa-Box  pictures. 

The  upper  row  of  pictures  shows  two  gray  level  pictures  of  the  same  Cocoa-Box  taken  from  two  different  view  points:  the  left 
view  was  used  for  the  model,  the  right  view  for  the  data.  The  left  and  right  figures  in  the  middle  row  show  the  corresponding 
normalized  images.  Indeed,  we  see  that  the  two  figures  in  this  row  coincide  if  we  rotate  the  left  one  by  f80  degrees  around  its 
centroid.  The  left  and  right  figures  in  the  lower  row  are  the  respective  reconstructed  image  data  from  the  model  view  (shown 
in  the  upper  left)  by  the  recovered  affine  transformation  using,  lower  left:  affine  invariant  plus  second  order  weighted 
moments  of  the  gray  level,  lower  right:  third  order  moments  of  the  binary  image  for  computing  the  rotation  angle.  If  the 
method  works  correctly,  then  those  reconstructed  images  should  coincide  with  the  corresponding  image  portion  found  in  the 
upper  right  figure.  Indeed,  we  see  that  both  of  the  methods  worked  very  well  for  recovering  the  transformation  parameters. 
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Figure  3:  Results  by  affine  invariant  method  on  the  Baby-Wipe  pictures. 

The  upper  row  of  pictures  shows  two  gray  level  pictures  of  a  Baby-Wipe  container  of  which  the  front  part  was  used  for  the 
experiment:  the  left  view  was  used  for  the  model,  while  the  right  view  was  used  for  the  data.  The  left  and  right  figures  in 
the  middle  row  show  the  respective  normalized  images.  Indeed,  we  see  that  the  two  figures  coincide  if  we  rotate  the  left 
figure  by  about  f80  degrees  around  its  centroid.  The  bottom  figure  is  the  reconstructed  image  data  from  the  model  view  by 
the  recovered  affine  transformation  using  affine  invariant  plus  second  order  weighted  moments  for  computing  the  rotation 
angle.  We  expect  that  the  reconstructed  image  coincides  well  with  the  image  in  the  upper  right. 
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Figure  4:  Results  by  affine  invariant  method  on  the  Tea-Box  pictures. 

The  upper  row  shows  the  pictures  of  a  Tea-Box:  the  left  view  used  for  the  model,  while  the  right  voew  was  used  for  the  data. 
The  left  and  right  figures  in  the  middle  row  are  the  respective  normalized  images  up  to  a  rotation.  The  left  and  right  figures 
in  the  lower  row  show  the  respective  reconstructed  image  data  from  the  model  view  using  the  recovered  affine  transformation 
based  on  affine  invariant  plus  second  order  weighted  moments  of  the  gray  level  (left)  and  third  order  moments  of  the  binary 
image  (right)  for  recovering  the  rotation  angle.  From  the  figure,  we  see  that  both  of  the  reconstructed  images  coincide  well 
with  the  original  data  shown  in  the  upper  right.  Though  both  the  methods  worked  fairly  well,  the  method  using  second 
order  weighted  moments  performed  slightly  better.  Considering  that  both  of  the  reconstructed  images  are  tilted  a  little  bit 
in  a  similar  manner,  perhaps  some  errors  were  introduced  in  the  manual  region  extraction. 
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Figure  5:  Sensitivity  analysis  against  perturbations  in  the  given  image. 

The  upper  pictures  show  examples  of  the  perturbed  image  data  for  which  some  percentage  of  the  image  region  was  dropped: 
left  5%,  middle  f5%,  right  25%.  The  lower  pictures  show  the  respective  reconstructed  image  data.  The  perturbations  in  the 
image  data  were  produced  by  dropping  particular  connected  regions  from  the  (almost)  perfect  image  data. 
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Sensitivity  of  affine  invariant  method  against  perturbations  in  moments 


Figure  6:  Sensitivity  of  the  recovered  parameters  by  affine  plus  rotation  invariants  against  perturbations. 
The  horizontal  axis  is  cr^  while  the  vertical  axis  is  8.  The  values  of  8,  accuracy  in  recovering  affine  parameters,  is  almost 
proportional  to  ,  the  perturbations,  when  it  is  small,  but  the  slope  increases  rapidly  as  elevates. 
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Figure  7:  Results  by  combination  of  geometric  and  differential  properties  on  the  Cocoa-Box  pictures. 

The  left  and  right  hgures  in  the  npper  row  show  the  respective  gradient  distribntion  —  the  horizontal  axis  is  and  the 
vertical  axis  is  fy  —  for  the  model  and  the  data  views.  The  lower  hgnre  shows  the  reconstrncted  image  data  from  the  model 
view  by  the  affine  transformation  that  was  recovered.  We  expect  this  hgnre  to  coincide  with  the  corresponding  portion  of  the 
npper  right  pictnre  in  Fignre  2.  From  the  hgnre,  we  see  that  the  algorithm  performed  almost  perfectly. 
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Figure  8:  Results  by  combination  of  geometric  and  differential  properties  on  the  Baby-Wipe  pictures. 

The  left  and  right  hgures  in  the  npper  row  show  the  respective  gradient  distribntion  for  the  model  and  the  data  view.  The 
lower  hgnre  is  the  reconstrncted  image  data,  that  we  expect  to  coincide  with  the  corresponding  portion  of  the  npper  right 
pictnre  of  the  Fignre  3.  The  accnracy  is  again  fairly  good,  thongh  not  as  good  as  that  obtained  by  affine  invariant  pins 
second  order  weighted  moments. 
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Figure  9:  Results  by  combination  of  geometric  and  differential  properties  on  the  Tea-Box  pictures. 

The  left  and  right  hgures  in  the  npper  row  show  the  respective  gradient  distribntion  for  the  model  and  the  data  view.  The 
lower  hgnre  is  the  reconstrncted  image  data,  that  we  expect  to  coincide  with  corresponding  portion  of  the  npper  right 
pictnre  of  Fignre  4.  The  resnlt  is  almost  as  good  as  the  one  by  nsing  affine  invariant. 


Figure  10:  Sensitivity  of  differential  method  against  perturbations. 

The  hgnre  shows  the  reconstrncted  image  data  for  the  same  pertnrbed  images  as  those  nsed  in  the  sensitivity  tests  for  affine 
invariant  method.  The  pictnres  show  respective  resnlts  for  the  pertnrbation  percentage  in  the  given  image:  left  5%,  middle 
15%,  right  25%. 
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Sensitivity  of  the  algorithms  against  perturbations:  Errors  vs.  Missing  Region 


Figure  11:  Sensitivity  of  the  recovered  parameters  by  differential  method  against  perturbations. 

The  values  of  8  (vertical  axis),  accuracy  in  recovering  affine  parameters,  are  plotted  against  the  percentage  of  the 
perturbation  in  the  given  image  data  (horizontal  axis).  The  results  by  affine  invariant  are  plotted  using  blocks,  while  those 
by  differential  method  are  plotted  using  stars.  Apparently,  the  results  by  differential  method  are  less  sensitive  to 
perturbations  than  those  by  affine  invariant  method. 
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